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Abstract 

We propose a new lack-of-fit test for quantile regression models that is suitable 
even with high-dimensional covariates. The test is based on the cumulative sum of 
residuals with respect to unidimensional linear projections of the covariates. The 
test adapts concepts proposed by Escanciano (Econometric Theory, 22, 2006) to 
cope with many covariates to the test proposed by He and Zhu (Journal of the 
American Statistical Association, 98, 2003). To approximate the critical values of 
the test, a wild bootstrap mechanism is used, similar to that proposed by Feng et al. 
(Biometrika, 98, 2011). An extensive simulation study was undertaken that shows 
the good performance of the new test, particularly when the dimension of the co¬ 
variate is high. The test can also be applied and performs well under heteroscedastic 
regression models. The test is illustrated with real data about the economic growth 
of 161 countries. 

Keywords: quantile regression, lack-of-fit testing, high-dimensional covariates. 


1 Introduction 

Let us consider a regression setting where a quantile of the response variable of interest, Y, is 
expressed as a function of a vector of explanatory variables, X. The resulting regression model 
can then be denoted by 

Y = g(X)+s 

where g represents the quantile regression function; and the error, e, has a conditional r-quantile 
equal to zero, P(e < Q| A = x) = r for almost all x. 

Quantile regression models have been receiving increased attention in the literature, due 
to their flexibility for general error distributions and because they provide a more detailed 
description of the conditional distribution of the response, compared to classical mean regression. 
Koenker and Bassett (1978) can be considered as the seminal work on the estimation of linear 
quantile regression models. The main concept is to exploit that the r-quantile, g, of a variable 
minimizes the expectation, 

E(p r (Y-g)), 
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where p T (r ) = rr/(r > 0) + (r — 1 )rl(r < 0), and /(•) denotes the indicator function of an 
event. Estimation of quantile models is obtained by minimizing the sum of penalized residuals, 
similarly to the sum of squares in the case of mean regression. That is, given a sample of 
independent observations, (Xi, Yi),..., (X n , Y n ), the coefficients of a linear model, g(x) = x'Q 
(; x' denotes the transpose of x), are estimated as the minimizers of 

n 

(Y t .-x'e). 

i= 1 

The same criterion can be applied to estimate general parametric models, where the regres¬ 
sion function is of the type g(-,9 ) and 9 is a parameter to be estimated, and even to nonpara- 
metric estimation of the quantile regression function. See Koenker (2005) for a complete review 
on quantile regression methods. 

We focus on the problem of testing a parametric model of quantile regression. That is, a 
test of the null hypothesis 


H 0 : g &Me = {g(-, 9) : 9 e 0 C M 9 } 
versus a nonparametric alternative. 

This problem was addressed by He and Zhu (2003), who based their test on the process 

n 

n~ 1/2 Y, 4 (■Yi ~ 9{X h 0)) g{Xi, 0)/(X* < t) 

i =1 

where ip(r) = rl(r > 0) + (r — 1 )I{r < 0) is the derivative of p T , g(x,9) = g(x,9), and 9 is 

an estimator of 9. This is an extension to the quantile regression setting of the cumsum process 
considered by Stute (1997) in the mean regression setting. Zheng (1998) proposed a U-statistic 
of the quantities ip(Yi — g(Xi,9 )) with smoothing kernel weights, thereby extending the test of 
Zheng (1996) to quantile models. Other specification tests for quantile regression models can be 
found in Horowitz and Spokoiny (2002), Whang (2006), Otsu (2008), Escanciano and Velasco 
(2010), and Escanciano and Goh (2014), among others. 

It is well-known that a high (or even moderate) dimension of the covariate can affect the 
performance of the specification tests. This problem has been addressed by several authors in 
the mean regression setting, where modified tests have been proposed with better properties for 
multiple covariates. In particular, Escanciano (2006) applied a cumsum test to one-dimensional 
projections of the covariates, Lavergne and Patilea (2008) considered similar one-dimensional 
projections for a Zheng type test, and Stute et al. (2008) based their test on the residual empirical 
process marked by proper functions of the regressors. 

Little can be found in the literature for lack-of-fit testing adapted to multidimensional co¬ 
variates in the framework of quantile regression. Wilcox (2008) used a He and Zhu type test 
and defined some ranks over the covariate. This proposal has the virtue of simplicity but does 
not provide an omnibus test, i.e., it is not consistent for all alternatives. 

We propose and study a lack-of-fit test for parametric models of quantile regression, with 
good properties for multidimensional covariates and consistent for all alternatives. In Section 
2 we present the new He and Zhu type test calculated on one-dimensional projections of the 
covariates. A bootstrap method is also proposed to approximate the critical values of the test. 
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Section 3 contains a simulation study where the performance of the test is studied under homo- 
and heteroscedastic models, with different error distributions and with increasing dimension of 
the covariate. We compare the proposed test with a He and Zhu test. In Section 4 the test is 
applied to real data, and we provide some concluding remarks and extensions in Section 5. 


2 The proposed method 

2.1 The test 

The strategy to improve the performance of the test with multiple covariates consists of apply¬ 
ing a lack-of-fit test to one-dimensional projections of the covariates. This is motivated by a 
fundamental result, that states that the null hypothesis, Hq : g £ A ig, holds if and only if, for 
some 9q £ 0 C R p , and for any (3 £ R d with ||/3|| = 1, 

P[Y-g(X,9 0 ) <0 | PX]=t 

almost surely. This is an immediate extension of Lemma 1 in Escanciano (2006) to the quantile 
regression setting. 

If the true parameter 6$ was known, the test could be based on the process 

n 

R n {/3, u) = n" 1 / 2 g(Xi, 9 0 )) g (W: A) I (/?% < u) . 

i= 1 

Otherwise, an estimator 0 is substituted, yielding the process useful for lack-of-fit testing of 
the parametric model 

n 

<(/?, U) = n - 1 ' 2 Y, V’ (Yi - g (Xi, d)) g (x i} d) I (flX t < u) . 

i= 1 

The test statistic is then defined as 

T n = largest eigenvalue of / Rh{/3, u)[R^(P, u)]'F nt p(du)d/3, (1) 

./ii 

where n = x [—oo,+oo], is the unit sphere on R rf , and F n g is the empirical distribution 
of the projected covariates f3'X \,..., (3'X n . 

The process R x n is similar to that proposed by Escanciano (2006), with two differences: the 
loss function is now the quantile loss function, and the gradient vector g(Xi,9) is introduced 
following the suggestion of He and Zhu (2003). 

The limit distribution of R n under the simple null hypothesis, Hq : g = g(x, 9 q) with 9q 
known, can be expressed as 

Rn Roc, 

where R oo is a Gaussian process with mean zero and covariance given by 

K{x i,x 2 ) = t(1 - t)E [g(X,9 0 )g'(X,9 0 )I(P[X < Ul )I{&X < u 2 )\ , 
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where x\ = (f3[,ui)' and X 2 = This result can be obtained similarly to Escanciano 

(2006), where the tightness comes from the fact that the family of functions in the definition of 
R n is a VC-class of functions. 

Under the composite null hypothesis of a parametric model, Hq : g 6 Mg, and under certain 
regularity conditions, the following representation can be obtained: 

n 

R-hiP,u) = [l(P'Xi <u)~ S(/3,u)S~ l ] g(Xi,9 0 ) + o p (l) 

i=i 

uniformly in (f3,u), where £i = Y t — g(Xi,9o ), i = 1,..., n are the errors, f(0\X) denotes the 
conditional density of the error at zero, and the matrices S and S(f3,u) are defined by 

5 = E\f(0\X)g(X,e 0 )sf(X,d o )] 

S(P,u) = E[f(O\X)g(X,6o)g'(X,0o)I(/3'X < «)]. 

The proof and the subsequent consequences are a combination of arguments given in He and 
Zhu (2003) and Escanciano (2006). The representation itself is different from that of He and 
Zhu (2003), because we do not assume homoscedasticity. From this representation, the limit 
distribution of the test statistic, T n , under the null hypothesis can be derived. 

Under the alternative, the representation is similar to the previous case, but a new term 
appears which will be crucial to prove the consistency of the test. Let us assume that the data 
come from 

Yi = g(Xi, 9 0 ) + n~ 1/2 h(Xi) + i <E {1,..., n}, 

where £\,... ,e n are independent errors with conditional r—quantile equal to zero. The errors 
are not assumed to be identically distributed. In particular, their density at zero may depend 
on X. With this type of data drawn from the alternative hypothesis, the process allows the 
following representation: 

n 

<(/3, u) = n- 1 ' 2 Y, Met) WXi <u)~ S((5, u)^- 1 ] g{X u 6 0 ) 

1=1 

+E [f (0\X)h(X)g'(X, 0o)I{j3'X < «)] 

-S^^S^E [f(0\X)h(X)g'(X,6 o )] +o p (l) 

uniformly in (/ 3,u ). The second and third summands of the right-hand side are constants 
reflecting the deviation from the null hypothesis. If the data come from 

Yi = g{Xi,6 0 ) + c n n~ l/2 h{Xi) + e, i € {1,..., n}, 

where c n is a sequence of real numbers converging to infinity (at any rate), then the test statistic, 
T n , will converge to infinity and the power of the test will converge to one. To obtain this consis¬ 
tency, it is assumed that the sequence g(x, do) +c n n~ 1 ^ 2 h{x) does not coincide with any element 
of the parametric model, Mg = {g(-,6) : 6 G © C M 9 }, and that Var(/(0| X)h(X)g'(X, 9)) > 0 
for any 9. 
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2.2 Bootstrap approximation 

The approximation of critical values is a crucial issue in lack-of-fit testing. One possible solution 
would be to use the limit distribution. However, this would require an estimate of the limit 
variance which involves the estimation of complicated unknown quantities. Furthermore, the 
convergence to the limit distribution could be slow. Another possibility could be to use the 
representations as given above. Then, a bootstrap method based on multipliers can be considered 
(see He and Zhu (2003)). The approximation by a multipliers bootstrap is generally better than 
the limit distribution, but still requires estimating many unknown quantities. He and Zhu (2003) 
assume homoscedasticity, so the conditional density of the error at zero, /(0|A), does not have 
to be estimated. On the other hand, Escanciano and Goh (2014) allow for heteroscedasticity 
and use a multipliers bootstrap, which requires an estimate of the conditional density f(0\X) 
by a smoothing method. 

We propose a bootstrap approximation based on drawing new bootstrap samples, {X \, Y - *),..., 
(X n ,Y*), where 

Y i = 9( x i,0) + e* i = 1,... ,n. 

8 is the parameter estimate obtained from the original sample, and e* = Wi\ri\, where r* = 
Yi — g(Xi,Q ) are the residuals from the original sample. The multipliers, Wi, are independently 
generated from a common distribution with r-quantile equal to zero. Following the proposal 
by Feng et al. (2011), the absolute values of the residuals are used to construct the bootstrap 
errors, which is a convenient modification of wild bootstrap for quantile regression. Regarding 
the multipliers distribution, we adopt the two-point distribution with probabilities (1 — r) and 
r at 2(1 — r) and —2r, respectively, that was proposed by Feng et al. (2011) to satisfy their 
Conditions 3, 4 and 5. Note that other common multipliers distributions for mean regression, 
generally with the only condition that the variance is one and occasionally with the condition 
that the third moment is one (see Mammen (1993) for a two-point multipliers distribution in the 
mean regression), do not satisfy Conditions 4 and 5 required by Feng et al. (2011) to establish 
consistency of the bootstrap for quantile regression. 

The advantage of the proposed bootstrap approximation for the lack-of-fit test, in comparison 
to existing methods such as those proposed by He and Zhu (2003) and Escanciano and Goh 
(2014), is that it allows consideration of heteroscedastic regression models of any type without 
needing to estimate complicated quantities in the representations, and in particular without 
estimating the conditional density /(0|X) by smoothing methods. 

Once the bootstrap sample is generated, the test statistic is computed in the same way as 
for the original sample, obtaining T*. If a number, B, of bootstrap samples are generated, then 
T* l ,... ,T* B represents the B bootstrap values of the test statistic. The p-value of the test 
may be approximated by the proportion of bootstrap values not smaller than the original test 
statistic, i.e., {l/B) ]Tf = i I(T n < T* b ). 

The validity of this bootstrap mechanism comes from the representation of the process R.\ 
under the null hypothesis, in terms of the true errors plus the parameters estimation, 

n 

Rn(/3, u) = n~ 1/2 YI il)(£i)g(Xj , 9 0 )I(/3 , X i < u) - S(fi,u)y/n (d - 6j + o p { 1) 

1=1 

uniformly in (/3, u ). A similar representation can be derived for the bootstrap process condition¬ 
ally on the original sample, where the convergence of the bootstrap version of the estimation 
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error, sjn(6* — 9), was established in Theorem 1 of Feng et al. (2011). 


2.3 Computational aspects 

Tests that face the curse of dimension usually require additional algorithms over other more 
common model checks. In particular, Escanciano (2006) and Stute et al. (2008) are based on 
Stute (1997)’s test and require additional computations over this original method. Similarly, 
Lavergne and Patilea (2008) is a test for high-dimensional covariates that is based on Zheng 
(1996)’s test, and requires an optimization algorithm over a set of Zheng-type statistics. The 
proposed method here is an adaptation of He and Zhu (2003)’s test to high-dimensional covariates 
with a procedure similar to that given by Escanciano (2006). One important virtue of this 
procedure is the ease of computation and that the amount of computations does not grow 
dramatically with the dimension of the covariate. To illustrate this, recall that our test statistic, 
T n , was defined in (1) as the largest eigenvalue of a Cramer-von-Mises norm of the process R\. 
Following Escanciano (2006), one can show that T n can be expressed as 


n n 

T n = largest eigenvalue of n -2 EE 

i =1 1=1 


where A^-. is given by 


A ■ ■ -- 


(o) 


E4 


r= 1 


^/ 2 -i 



where A^ r is the complementary angle between the vectors (X t — X r ) and (Xj — X r ) measured 
in radians, F is the gamma function, and d is the dimension of the covariate, X. Thus, the total 
number of computations required to obtain the test statistic depends on the dimension, d, only 
at a linear rate, which is the same rate required by He and Zhu (2003)’s test, and much less 
than the optimization in d dimensions required by other methods in the literature. Note also 
that the matrix A,j», which is the most expensive in computation time, does not need to be 
computed for each bootstrap sample. All these computational properties are particularly useful 
in the case of high-dimensional or functional covariates, see Garca-Portugus et al. (2013) for an 
illustration in the mean regression functional context. 

Table 1 shows the mean of the times required by 1000 original samples with B = 500 
bootstrap replications, in units of seconds per original sample. The data are drawn from Model 
8, whose details are given in the next section, and the sample size is n = 100. The dimension of 
the covariate is d = t + 2. As expected, the new test requires more computations than He and 
Zhu (2003)’s test, but the differences are quite small, and the amount of computations does not 
dramatically grow with the dimension, even for very large dimensions. The gain of power from 
the new test, shown in the next section, justifies the small increase in the computation time. 


3 Simulation study 

We study the performance of our proposed method under the null and the alternative hypotheses 
using a Monte Carlo simulation. In all experiments, the number of simulated original samples 


6 




t = 0 t = 2 t = 6 t = 10 t = 20 t = 30 t = 40 t = 50 

Proposed test 
HZ test 

2.76 2.84 2.85 2.91 2.91 3.10 3.20 3.38 

2.71 2.51 2.81 2.56 2.92 2.83 2.85 2.77 


Table 1: Computational times (seconds per sample) associated with our proposed lack-of 
fit test (Proposed test) and with the test proposed by He and Zhu (2003) (HZ test) as a 
function of the dimension t + 2 of the covariate. 



Model 1 

Model 2 

Model 3 


a = 0.10 

a = 0.05 

a = 0.01 

a = 0.10 

a = 0.05 

a = 0.01 

a = 0.10 

a = 0.05 

a = 0.01 

n=25 

0.096 

0.049 

0.002 

0.119 

0.066 

0.017 

0.107 

0.061 

0.014 

n=50 

0.112 

0.047 

0.008 

0.112 

0.053 

0.014 

0.099 

0.045 

0.005 

n=100 

0.102 

0.058 

0.016 

0.094 

0.047 

0.011 

0.107 

0.049 

0.010 

n=150 

0.089 

0.048 

0.007 

0.104 

0.056 

0.014 

0.096 

0.055 

0.015 

n=200 

0.100 

0.048 

0.010 

0.106 

0.049 

0.010 

0.100 

0.054 

0.015 


Table 2: Proportions of rejections associated with our proposed lack-of-fit test for Models 
1, 2 and 3. 


was 1000, the number of bootstrap replications B = 500, and the multipliers for the bootstrap 
approximation followed the two-point distribution given in Section 2.2. 

We first focus on the behavior under the null hypothesis, to check the adjustment of the 
significance level. We simulate values for the following quantile regression models with r = 0.5: 

Model 1: Y = 1 + X\ + X 2 + £, 

Model 2: Y = 1 + + X 2 + X 3 + X 4 + X 5 + e, 

Model 3: Y = 1 + X\ + X 2 + 

where X % £ Uniform(0,1) for i = {1, < - • , 5}, and they are mutually independent; and f(x) = 

x + 0.5 and e £ 2V(0,1) is the unknown error, which is drawn independently of the covariates. In 
Models 1 and 3 the null hypothesis is the linear model in X\ and X 2 versus an alternative that 
includes any dependence of Y on X\ and X 2 . In Model 2 the null hypothesis is the linear model 
in the five explanatory variables versus any dependence on them. Model 1 represents a common 
homoscedastic model with small dimension of the covariate. Model 2 is intended to show the 
possible effect of a larger dimension on the level. Model 3 is useful to show the possible effect 
of heteroscedasticity on the level. 

Table 2 shows the proportions of rejections associated with different sample sizes, n, and 
for different nominal significance levels, a. The proposed test works well in a homoscedastic 
context (Models 1 and 2) as well as in a heteroscedastic context (Model 3) even for small sample 
sizes. Comparing Models 1 and 2, the increase of the dimension of the explanatory variables 
does not have a negative impact on the adjustment of the significance level of the test. These 
are important, because our bootstrap mechanism was designed to work under heteroscedastic 
models and the aim of the test itself was to be applied for larger dimensions of the covariate. 
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t = 0.10 r = 0.25 r = 0.50 r = 0.75 r = 0.90 

e € Centered Standard Normal n = 50 

n = 150 

0.048 0.061 0.047 0.063 0.043 

0.052 0.053 0.048 0.057 0.051 

e € Centered Log-Normal n = 50 
n = 150 

0.057 0.053 0.042 0.055 0.057 

0.041 0.053 0.052 0.051 0.057 

e € Centered Exponential n = 50 
n = 150 

0.058 0.054 0.048 0.057 0.053 

0.060 0.056 0.059 0.049 0.046 


Table 3: Proportions of rejections associated with our lack-of-fit test for Model 1, for 
different error distributions and different quantiles, with nominal level a = 0.05. 


Table 3 provides the same proportions of rejections for different error distributions and 
quantiles, restricted to Model 1 and nominal level a = 0.05. The error distributions are centered 
standard normal, centered log-normal, and centered exponential with expectation one. That is, 
e = Z — z T , where Z follows a standard normal, log-normal, and exponential with expectation 
one, respectively, and z T is the r-quantile of the Z-distribution. The nominal level is respected 
under the null hypothesis for all the error distributions considered and orders of the quantile. 

We now study the performance of the new test under the alternative. To this end, the new 
test will be compared with that of He and Zhu (2003). Before doing so, we must remember 
that He and Zhu (2003) suggested a bootstrap calibration of their test based on an asymptotic 
representation of the empirical process in a homoscedastic scene. We will verify if this manner 
of calibrating the test allows a good fit to the significance level for heteroscedastic models. We 
simulate values of the following regression model with t = 0.5 under the null hypothesis of 
linearity: 

Model 4: Y = 1+ X\ + f(X 1 )e, 

where X\ G Uniform(0,1), f{x ) = x + 0.5, e G N( 0,1), and X\ and £ are independent. 

The proportions of rejections associated with the test proposed by He and Zhu (2003) are 
shown in Table 4 for different sample sizes and nominal significance levels. The bootstrap method 
proposed by He and Zhu (2003) does not work well in a heteroscedastic context. This is due to 
their representation being only valid under homoscedasticity. However, the proposed bootstrap 
(Section 2.2) works well for their test also under heteroscedasticity. Therefore, with the aim to 
make a fair comparison between our proposal and He and Zhu (2003)’s test, subsequently we 
use a wild bootstrap as given in Section 2.2 to calibrate both lack-of-fit tests. 

Once the adjustment of the level of both lack-of-fit tests has been studied, we analyze 
their performance under the alternative hypothesis. Consider the following regression model 
associated with quantiles of different orders, r: 

Model 5: Y = l + \ (X\ — X 2 ) + e T , 

5 

where X±,X 2 G N( 0, 1) and they are independent, and e = Z — z T , where z T is the r-quantile of 
the variable Z. Z is drawn independently of X\ and X 2 . Three possibilities are considered for 
the distribution of Z: standard normal, uniform on the interval (—1,1), and chi-squared with 
four degrees of freedom. 








Wild bootstrap 
of Section 2.2 

Bootstrap proposed 
in He and Zhu (2003) 


a = 0.10 a = 0.05 a = 0.01 

a = 0.10 a = 0.05 a = 0.01 

n = 25 

0.103 0.057 0.014 

0.441 0.305 0.142 

n = 50 

0.116 0.064 0.015 

0.263 0.164 0.067 

n = 100 

0.094 0.051 0.013 

0.167 0.092 0.033 

n = 150 

0.104 0.051 0.010 

0.155 0.085 0.025 

n = 200 

0.103 0.051 0.014 

0.136 0.080 0.026 


Table 4: Proportions of rejections associated with the test proposed by He and Zhu (2003) 
for the heteroscedastic Model 4 with two types of bootstrap approximations. 


Table 5 shows the proportions of rejections for several quantiles and the three error distribu¬ 
tions, when the tests are applied to check the no-effect model, i.e., to check the null hypothesis 
that the quantile regression function is a constant not depending on the covariates. The sample 
size is fixed to n = 100. We consider a relatively simple hypothesis and a simple deviation 
under the alternative, to facilitate the comparison between quantiles of different orders, and to 
evaluate the effect of the error distribution. 

The proposed test is more powerful than He and Zhu (2003)’s test for any of the quantiles 
and for the three error distributions. The power of the proposed test is symmetric with respect 
to the order of the quantile around 0.5 for the symmetric error distributions, which are the 
standard normal and the uniform in Table 5. For the standard normal error distribution, the 
proposed test is more powerful for the central quantiles (around 0.5), which can be explained by 
the higher density at these quantiles. For the uniform error distribution, the density is constant 
with respect to the quantile, while the factor r( 1 — r) appearing in the asymptotic distribution 
of the proposed test makes the test more powerful for the external quantiles (with orders close 
to 0 or 1). For the chi-squared error distribution, the proposed test is more powerful for the 
quantiles with smaller order, since the error distribution is asymmetric with higher density at 
these quantiles. 

We now consider a linear model under the null hypothesis and a quadratic deviation under 
the alternative. The deviation is multiplied by a value c > 0, to evaluate the effect of the 
deviation on the power of the test. 

Model 6: Y = 1 + X 1 + X 2 + c (Xf + Xf + X x X 2 ) + e T , 

where X\ € Uniform(0,1), X 2 € iV(0,1); and e T is a log-normal distribution centered to the 
quantile r, i.e., e T = e z — e 2r , where Z G N{ 0,1) and z T are the r-quantile of the variable Z\ 
and Xi, X 2 and e T are drawn independently. 

Figure 1 shows the powers of the proposed test and He and Zhu (2003)’s test as functions 
of the value of c, and with five orders of the quantile: 0.1, 0.25, 0.5, 0.75, and 0.9. The nominal 
level is a = 0.05 and the sample size is fixed to n = 150. As expected, the power increases with 
c. The new test is more powerful than He and Zhu (2003)’s test for any value of c and for any 
of the considered quantiles. Both tests are more powerful for central quantiles (orders close to 
0.5). Symmetry around 0.5 is not strictly satisfied, since the error distribution is not symmetric 
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Proposed test 

HZ test 

a = 0.10 a = 0.05 a = 0.01 

a = 0.10 

a = 0.05 

a = 0.01 

ZeN( 0 , 1 ) 

T = 0.10 

0.346 

0.229 

0.092 

0.183 

0.094 

0.023 


r = 0.25 

0.498 

0.362 

0.180 

0.210 

0.121 

0.030 


r = 0.50 

0.575 

0.444 

0.231 

0.208 

0.110 

0.032 


r = 0.75 

0.487 

0.377 

0.200 

0.191 

0.096 

0.016 


r = 0.90 

0.357 

0.245 

0.102 

0.128 

0.052 

0.007 

Z E Uniform(-l, 1) 

r = 0.10 

0.930 

0.885 

0.707 

0.524 

0.335 

0.112 


r = 0.25 

0.866 

0.789 

0.593 

0.397 

0.242 

0.066 


r = 0.50 

0.809 

0.691 

0.475 

0.325 

0.186 

0.056 


r = 0.75 

0.877 

0.795 

0.587 

0.381 

0.229 

0.054 


r = 0.90 

0.945 

0.872 

0.693 

0.382 

0.193 

0.027 

zexi 

r = 0.10 

0.315 

0.207 

0.076 

0.144 

0.078 

0.018 


r = 0.25 

0.245 

0.144 

0.045 

0.124 

0.056 

0.015 


r = 0.50 

0.208 

0.124 

0.041 

0.112 

0.058 

0.012 


r = 0.75 

0.141 

0.070 

0.022 

0.115 

0.058 

0.017 


r = 0.90 

0.137 

0.077 

0.028 

0.120 

0.064 

0.015 


Table 5: Proportions of rejections associated with our proposed lack-of-fit test (Proposed 
test) and with the test proposed by He and Zhu (2003) (HZ test) for Model 5. 


around the median, and the deviation from the null hypothesis is more complex than that given 
in Model 5. 

We consider different deviations from the linear null hypothesis and error distributions, as 
Model 7. 


Model 7: Y = 1 + X 1 + X 2 + h(X) + e, 

where X\ £ Uniform(0,1), X 2 £ 1V(0,1); and £ = Z — z T , with z T being the r-quantile of the 
variable Z; and X\, X 2 , and Z are drawn independently. For the deviation h(X), a quadratic 
function including interaction is considered, as well as a sinus, exponential, and logarithm func¬ 
tion of the linear transformation l(x ) = 1 + x\ + X 2 (see Table 6). For the distribution of Z , the 
log-normal, chi-squared with two degrees of freedom, exponential with expectation one, and a 
mixture of normal distributions are considered. The mixture is obtained as a standard normal 
with probability 0.75 and a normal distribution with mean 5 and standard deviation 2 with 
probability 0.25. 

The proposed test and He and Zhu (2003)’s test are applied to check the null hypothesis of 
linearity on X\ and X 2 with nominal level a = 0.05. Results for the proportions of rejections 
are given in Table 6. For each deviation and each error distribution, the proposed test is more 
powerful than He and Zhu (2003)’s. 

Our main purpose in proposing a new lack-of-fit test was to overcome the curse of dimen¬ 
sionality. Thus, the new test should show an acceptable power for increasing dimensionality of 


10 



T = 0.1 


x = 0.25 



c 


c 


Figure 1: Proportion of rejections associated with our proposed lack-of-fit test (Proposed 
test) and the test proposed by He and Zhu (2003) (HZ test) for Model 6 depending on 
the parameter c and the r-quantile of interest. 
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Proposed 

HZ 

Proposed 

HZ 

Proposed 

HZ 

Proposed 

HZ 

r = 0.25 

0.373 

0.162 

0.199 

0.097 

0.448 

0.184 

0.135 

0.08 

r = 0.5 

0.577 

0.364 

0.345 

0.208 

0.696 

0.435 

0.287 

0.17 

r = 0.75 

0.309 

0.217 

0.200 

0.150 

0.490 

0.365 

0.074 

0.06 

r = 0.25 

0.994 

0.910 

0.934 

0.705 

1.000 

0.962 

0.702 

0.38 

r = 0.5 

0.981 

0.899 

0.849 

0.619 

0.999 

0.976 

0.829 

0.61 

r = 0.75 

0.773 

0.579 

0.516 

0.361 

0.952 

0.831 

0.138 

0.11 

r = 0.25 

0.443 

0.409 

0.425 

0.414 

0.461 

0.429 

0.381 

0.35 

r = 0.5 

0.562 

0.321 

0.458 

0.270 

0.607 

0.353 

0.390 

0.23 

r = 0.75 

0.124 

0.066 

0.106 

0.061 

0.157 

0.081 

0.103 

0.05 

r = 0.25 

1.000 

0.996 

1.000 

0.990 

1.000 

0.999 

0.998 

0.98 

r = 0.5 

1.000 

0.997 

1.000 

0.998 

1.000 

1.000 

1.000 

0.95 

r = 0.75 

0.865 

0.419 

0.811 

0.380 

0.982 

0.637 

0.586 

0.22 

r = 0.25 

0.190 

0.113 

0.154 

0.112 

0.169 

0.135 

0.133 

0.10 

r = 0.5 

0.411 

0.251 

0.254 

0.167 

0.483 

0.268 

0.225 

0.16 

r = 0.75 

0.244 

0.145 

0.164 

0.097 

0.382 

0.251 

0.102 

0.08 

r = 0.25 

0.917 

0.498 

0.788 

0.378 

0.963 

0.577 

0.533 

0.28 

r = 0.5 

0.980 

0.747 

0.797 

0.455 

0.998 

0.868 

0.759 

0.45 

r = 0.75 

0.700 

0.450 

0.493 

0.325 

0.955 

0.744 

0.216 

0.13 

r = 0.25 

0.820 

0.627 

0.736 

0.570 

0.874 

0.678 

0.622 

0.48 

r = 0.5 

0.396 

0.306 

0.291 

0.200 

0.561 

0.398 

0.227 

0.18 

r = 0.75 

0.090 

0.094 

0.098 

0.086 

0.122 

0.104 

0.068 

0.07 

r = 0.25 

1.000 

0.998 

0.999 

0.987 

1.000 

1.000 

0.997 

0.97 

r = 0.5 

0.897 

0.757 

0.751 

0.558 

0.971 

0.875 

0.660 

0.47 

r = 0.75 

0.166 

0.171 

0.167 

0.166 

0.297 

0.196 

0.112 

0.13 


Table 6: Proportions of rejections associated with our proposed lack-of-fit test (Proposed) 
and to the test proposed by He and Zhu (2003) (HZ) for Model 7. 



the covariate. To check this, we simulate values of the following median regression model: 


Model 8: 




1 

3 



+ XiX 2 + X 2 


+ 


where our goal is to realize the following lack-of-fit test: 


H 0 : Y = 0 O + 0 x X\ + 0 2 X 2 + e 

H a : Y = g(X u X 2 , X 2+l ,..., X 2+t ) + e, 


where X t £ Uniform(0,1) if i is odd, and X t £ iV(0,1) if i is even; the error is drawn from 
the centered log-normal distribution, i.e., e = e z — 1 where Z £ N(0, 1); g is any smooth 
(nonparametric) function of the covariates; and t represents the number of additional covariates 
in the alternative, and so is the additional dimension where the test is looking for deviations 
from the null. It would be expected that increased value of t implies decreased power of the 
test. 

Table 7 shows the proportions of rejections associated with the new test and He and Zhu 
(2003)’s test, for different values of the additional dimension, t. Both tests suffer a loss of 
power due to the increase of the dimension, as expected. Nonetheless, the loss of power is more 
pronounced for the test proposed by He and Zhu (2003). For example, from dimension t = 6 
the proportion of rejections associated with their test is near to the significance level, whereas 
our proposed test preserves noticeable power, even for very high dimensions. 

Note that, for very high dimensions, He and Zhu (2003)’s test statistic is almost degenerate, 
because for any observation of the covariate, Xj, the indicators I(Xj < X t ), involved in the 
computation of their test process at X t , will be zero for most of the other observations Xj, when 
the dimension of the covariates X* and Xj is large. Thus, the test is unable to make a reasonable 
number of evaluations to check the model, and its power is consequently destroyed, as observed 
in Table 7 for t > 10. On the other hand, our proposed method is able to make comparisons even 
for large dimensions of the covariate, because the indicators are calculated with unidimensional 
projections of the covariate. We conclude that the proposed method constitutes a necessary 
modification of He and Zhu (2003) when the dimension of the covariate is large. 


4 Application to real data 

The proposed method is applied to real data from the evolution of the Gross Domestic Product 
(GDP) in several countries. GDP is an economic indicator that reflects the monetary value of 
the goods and final services produced by an economy in a certain period and it is used as a 
measure of the material well-being of a society. Different median regression models have been 
proposed to explain the annual growth rate of the Per Capita GDP in terms of a number of 
explanatory variables, including the initial Per Capita GDP and diverse economic and social 
indicators. 

We focus on the model of Koenker and Machado (1999), based on the available information 
included in Barro and Lee (1994). A complete study of this economic model is given by Barro 
and Sala-i-Martin (1995). The aim of Koenker and Machado (1999) was to check the combined 
effect of the different explanatory variables on the response in a quantile regression model. Here 
we test the specification of the quantile regression model itself. 
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The dataset we use is available in the R package quantreg, barro (http://cran.at. 
r-project.org/web/packages/quantreg/). This data set contains measurements associated 
with 71 countries during the period 1965-1975 and 90 countries during the period 1975-1985, 
yielding a total sample size of n = 161 countries. 

The explanatory variables used to explain the median of the annual growth of the Per Capita 
GDP (the response variable, Y) can be split in two groups as given below. More details about 
these variables and their role in the model for GDP can be found in Barro and Sala-i-Martin 
(1995). 

State variables: These variables reflect characteristics of the different countries that cannot 
be directly decided by political or social agents. They are measures of the steady-state 
position of the country, such as human capital, education or health. Koenker and Machado 
(1999) consider the following variables in this group: 

X\ := log(Initial Per Capita GDP) 

X 2 := Male Secondary Education 
Xg := Female Secondary Education 
X 4 := Female Higher Education 
X§ := Male Higher Education 
Xq := Life Expectancy 
X 7 := Human Capital 

Control and environmental variables: These variables are direct consequences of decisions 
made by government or private agents. The variables included in this group are 

Xg := Education/GDP 

Xq := Investment/GDP 

X\q := Public Consumption/GDP 

Xu := Black Market Premium 

X 12 := Political Instability 

X 13 := Growth Rate Terms Trade 

We apply the AIC criterion proposed by Hurvich and Tsai (1990) to variable selection among 
the thirteen explanatory variables for the quantile regression model. We will consider only those 
variables that show as relevant for the response. Based on this criterion, we propose a model 
that includes the variables X t with i € T\ = {1, 2, 6 , 7, 9,10,11,12,13}. 
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We apply the proposed lack-of-fit test in four different testing problems: 


Problem 1 < 


13 


H 0 :Y = 6 0 + J 2 X, A + £ i 


i =1 


[ H a : Y = g(X 1 ,X 2 ,...,X 13 )+£ 1 


Problem 2 < 


H 0 :Y = d 0 + J2 X + £2 

ieXi 

H a : I = g(X i: i G I\) + e 2 


Problem 3 < 


iZo:y = 00 + 2 X & + £ 3 

ieXi 

^ a :y = 5 (W 1 ,X 2) ...,X 13 )+£3 


[ #0 : Y = e 0 + ^2 X % + £ 4 
Problem 4 < ieX 2 

^ a :F = 5 (W 1 ,X 23 ...,W 13 ) + £4 


where X 2 = {1, 2, 3,4, 5, 6, 7} (state variables). Problem 1 is a lack-of-fit test of the linear model 
versus a nonparametric alternative, including all the thirteen explanatory variables under both 
the null and alternative hypotheses. Problem 2 is a lack-of-fit test of the linear model versus 
a nonparametric alternative, including only the nine variables in the set T\. Problem 3 is the 
same test as Problem 2, but with an alternative in the thirteen original variables. Problem 4 is 
a lack-of-fit test of a linear model that only includes the state variables. 

Table 8 contains the p-values obtained from the application of the proposed lack-of-fit test 
to each of the testing problems. The number of bootstrap replications was B = 500. We would 
accept the null hypothesis in Problems 1, 2 and 3. In Problem 3, the model under the null is the 
simplest, while the model under the alternative is the most complex. Despite this, the p -value 
is quite large, so we can conclude that the simple model with the nine explanatory variables 
in the set T\ is correct, and there is no significant deviation from this model arising from any 
(smooth) function of the thirteen possible explanatory variables. 

On the other hand, the null hypothesis is rejected for Problem 4. Thus, a model that only 
includes the state variables is insufficient to explain the evolution of the GDP, that is, some of 
the control or environmental variables are necessary. 

In summary, our proposed test confirms the validity of the model proposed by Koenker and 
Machado (1999). In addition, from the outcome for Problem 3, it would be sufficient to consider 
a model with nine explanatory variables to explain the growth rate of the Per Capita GDP. 


5 Concluding remarks and extensions 

We proposed a new lack-of-fit test for quantile regression models, together with a bootstrap 
mechanism to approximate the critical values. The bootstrap approximation does not need 
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Problem 1 

Problem 2 

Problem 3 

Problem 4 

p -values 

0.194 

0.458 

0.440 

0.002 


Table 8: />values obtained by the proposed lack-of-fit test for Problems 1, 2, 3 and 4. 


to estimate the conditional sparsity, and was shown to work well in homoscedastic and het- 
eroscedastic error distributions and with high-dimensional covariates. 

The proposed test is generally more powerful than its natural competitors, and particularly 
more powerful in the case of a high-dimensional covariate. 

The proposed test was applied to a real data situation, where it was useful to validate well- 
known models in the economic literature, that describe the evolution of the GDP in terms of a 
number of explanatory variables. 

The proposed method can be generalized to test models involving quantiles of different 
orders. The most treated model in the literature is the multiple quantile linear model, where it 
is assumed that the quantile regression function is linear for a subset of orders r £ T C [0,1], 

9t(x) = x'6(t), 

with coefficients 9(t) depending on the order, r, of the quantile. The coefficients 9(t) allow 
consideration of a different effect of the covariates depending on the order of the quantile. See 
Escanciano and Goh (2014) for a lack-of-fit test of multiple quantile linear models, or Escan- 
ciano and Velasco (2010) for a test of multiple quantile models with time series. Our proposed 
method can be generalized to test multiple quantile models in a general framework of para¬ 
metric (possibly nonlinear) quantile regression with heteroscedasticity and without estimating 
unknown quantities. To this end, one would consider a process depending on (/3, it), as well as on 
r. We restricted to the case of testing a single quantile to focus on the performance of the test 
for high-dimensional covariates and other important features of the testing problem. Extension 
to multiple quantile testing was left to future research. Similarly, extensions of the proposed 
method to time series are possible using the results in Escanciano and Velasco (2010). These 
possible extensions show that the concept of projecting the covariate, given by Escanciano (2006) 
to overcome the curse of dimensionality, combined with the bootstrap methodology introduced 
by Feng et al. (2011), provide a promising strategy for checking quantile regression models. 
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